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We consider Bayesian hierarchical models for survival analysis, where the survival times are 
modeled through an underlying diffusion process which determines the hazard rate. We show 
how these models can be efficiently treated by means of Markov chain Monte Carlo techniques. 
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1. Introduction 

Diffusion processes have found many applications in the modeling of continuous-time 
phenomena, for problems related to a variety of scientific areas, ranging from economics 
to biology, from physics to engineering. Here, we use diffusion processes as building blocks 
for the definition of models for survival and event history analysis. This idea is not new 
(see, e.g., the reviews in Aalen and Gjessing (2001, 2004)). However, in this paper, we 
are able to considerably extend the flexibility of the diffusion models used, by adopting 
powerful Markov chain Monte Carlo techniques. 

Diffusion models for survival analysis have been proposed because, as summarized 
in Aalen and Gjessing (2004), "when modelling survival data it may be of interest to 
imagine an underlying process leading up to the event in question." Such a process 
might, for example, represent the development of a disease. Two types of models have 
been considered in the literature: models where the event happens when a diffusion pro- 
cess hits some barrier and models where the hazard rate is some suitable function of 
the diffusion. For the former type of model, we refer the reader to Aalen and Gjessing 
(2001), Aalen, Borgan and Gjessing (2008) and references therein. Here, we are inter- 
ested in the latter. Woodbury and Manton (1977) proposed a model where the hazard 
rate is a quadratic function of an Ornstein-Uhlenbeck diffusion process. This model 
has since been considered by several authors, including Myers (1981), Yashin (1985), 
Yashin and Vaupel (1986) and Aalen and Gjessing (2004). For given values of the pa- 
rameters of the Ornstein-Uhlenbeck process, survival distributions and hazards are stud- 
ied. Myers (1981) focuses on survival distributions conditioned on initial covariate val- 
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ues; Yashin (1985) and Yashin and Vaupel (1986) use hazards based on quadratic func- 
tions of Ornstein-Uhlenbeck processes in order to model heterogeneity among groups 
and individuals, and to study the relative hazard functions and survival distributions; 
Aalen and Gjessing (2004) derives quasi-stationary distributions. Obtaining such analy- 
tical results for hazard functions other than quadratic functions, or for more complex 
diffusion processes, is not feasible. 

In our paper, we adopt a Bayesian approach and show how these models can be ef- 
ficiently treated by means of Markov chain Monte Carlo techniques for general choices 
of diffusion processes and hazard functions. For instance, by the proposed methods, it 
is possible to deal with latent diffusion models which are stochastic perturbations of 
common survival models. We also consider the case of multiple groups of observations, 
typical of clinical trials, and we show how to efficiently deal with covariates. We illustrate 
the methods via simulation studies and applications to real-world data. 

It should be mentioned that other classes of Bayesian nonparametric and semi- 
parametric models for survival analysis have been proposed in the literature. Among the 
most important, we mention the models based on neutral to the right random probabilities, 
whose cumulative hazard rates are processes with independent increments (see Doksum 
(1974) and Ferguson (1974) for the definition and properties of these random mea- 
sures, and, e.g., Susarla and Van Ryzin (1976), Kalbfleisch (1978), Ferguson and Phadia 
(1979), Hjort (1990) and Damien and Walker (2002) for applications in survival analysis), 
and all models falling within the framework of multiplicative intensity models, whose haz- 
ard rates are mixtures of known kernels where the mixing measure is a weighted gamma 
process (see Dykstra and Laud (1981), Lo and Wcng (1989), Ishwaran and James (2004) 
and references therein). 

The paper is organized as follows. In Section 2, we recall the essentials of diffusion 
processes and introduce the model; we also outline how, in the described framework, it 
is possible to consider stochastic perturbations of common survival models. In Section 3, 
we describe the MCMC scheme and gives the details of a suitable Hastings-within-Gibbs 
algorithm, showing its implementation by means of a toy example. In Section 4, we 
present improved versions of the algorithm, based on rcparamctrizations of the model. 
In Section 5, we discuss a straightforward generalization of the framework developed 
in the previous sections and deal with the case of multiple groups of observations; this 
is also illustrated by application to a data set from a clinical trial, one that has been 
considered in a number of papers in the context of survival analysis, the famous paper 
by Cox (1972) being among the earliest. In Section 6, we describe how covariates can 
be efficiently included in the proposed models and give an illustrative application to the 
lung cancer data set analyzed by Muers, Shevlin and Brown (1996). Finally, in Sections 
7 and 8, we discuss possible extensions of the models considered. 

2. Latent diffusion models 

Let be a random variable with values in M. d . Denote by C([0,oo),K) the space of 
continuous functions from [0, oo) to K and by C its cylinder c-algebra. Given = 6, 
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consider the scalar diffusion process X = {X t : t > 0}, solution of a stochastic differential 
equation (SDE) of the form 

dX t =P(X t ,9)dt + o-dB t , t>0, 

(1) 

X = x , 

driven by the standard scalar Brownian motion B = {B t : t > 0}. The Brownian motion 
B and the diffusion process X are random elements of (C([0, oo),R),C). The diffusion 
coefficient a is assumed constant and known, for the moment. The more technically 
difficult case of unknown a is postponed to Section 7. The drift (3(x,9) is assumed to be 
jointly measurable in x and 6, and to satisfy the regularity conditions (locally Lipschitz, 
with linear growth bound) that guarantee the existence of a weakly unique global solution 
to (1). Sec, for example Rogers and Williams (2000), Chapter V.24. 

Let Wo- be the law of aB and, for a given 9, denote by Pg the law of the diffusion 
X, solution of (1). By Girsanov's theorem, the Radon-Nikodym derivative of Pg with 
respect to W CT is given by 

HP, f [°° p(x t ,Q) A 1 f°° P(x t ,9) 2 ^ 



where x is an element of (C([0,oo),R),C). See, for example, Rogers and Williams (2000), 
Chapter V.27. 

Similarly, for a finite T, denote by C([0,T],R) the space of continuous functions from 
[0,T] to R and by C T its cylinder er-algebra. Then, B [0 m '■= {B t : < t < T] and X [a , T] = 
{X t : 0<t<T} arc random elements of (C([0,T],R),C^). LetW r . ff be the law of aB [0 , T ] 
and, for a given 9, denote by Pt,s the law of ^[o,t]- Then, by Girsanov's theorem, the 
Radon-Nikodym derivative of Px,e with respect to Wt, 5 is given by 

d¥ T ,e , , f f T P(x u 0) , 1 [ T P(xt,6)* M \ 



T,cr 

and, for each T, the measures Pt,s are absolutely continuous. 

Given the diffusion X, let us consider the random distribution function Fx.h on [0, oo), 
defined as 

F x ,h{t):=\-exp{- j h(X s )ds}, t>0, (3) 

where h(-) is some suitable non- negative and continuous function with h(X s ) ds = oo 
almost surely. The function h(-) plays the role of the hazard function and h(X t ) is the 
random hazard rate at time t associated with the random distribution Fx,h- 

Two features of the random measure Fx,h have to be noted. The first is that the 
hazard inherits the Markov property of the diffusion process so that the hazard at a 
future time t' depends only on the hazard at the present time t. Indeed, the Markov 
property seems a sensible choice to make at the level of the hazard. The second is that the 
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cumulative hazard is a process with positively correlated increments, being the integral 
of a continuous process. The latter feature is natural in many contexts and it introduces 
to the model a concern with the stochastic process that clearly must lie behind the 
occurrence of events. In words, a high increment of the cumulative hazard over the 
time interval [t,t'\ means that the underlying stochastic process has reached a region 
of high risk and this is likely to yield a high increment of the cumulative hazard over 
a close (disjoint) time interval. The strength of this positive correlation, and thus the 
smoothness of the cumulative hazard, depends on the choice of the hazard function h 
and of the diffusion process X: the rougher the diffusion, the weaker the correlation, 
and vice versa; see also the comments in Section 8. Note that the property we have just 
highlighted differentiates the models we are considering from models based on neutral to 
the right random probabilities, whose cumulative hazards are processes with independent 
increments and thus have an erratic behaviour. 

Let us now consider a sequence of event times Yi,Y%,... which are, conditionally on 
Fx,h, independent and identically distributed (i.i.d.) with common distribution Fx,h- 
From (3), it follows that the distribution of Y\, . . . ,Y n , given X = x, has density, with 
respect to the n-dimensional Lebesgue measure £ n , given by 



l(yi,...,y n \x) 



\h{X y] ) 



exp 




h(x t )dt 



(4) 



Censored observations can easily be dealt with in this setting. In the present paper, we 
shall restrict our attention to independent right-censored schemes. If we let (yi, . . . , y m ) 
be the observed event times and let (y m +i+, . . . , y n +) be the right-censored event times, 
then the likelihood becomes 



Kvu ■ ■ ■ ,ym,y m +i+, ■ ■ ■ ,y n + \x) 



l[h(c 

i=i 



Vj, 



( m n y . + 

exp / h(x t )dt- V / 

I j=l J ° j=m+l J ° 



h(x t ) dt 



We are thus considering a latent diffusion model for survival analysis, where the sur- 
vival times are modeled via an underlying diffusion process which determines the hazard 
rate. As highlighted by Aalen and Gjessing (2004), this model can also be interpreted 
as a random barrier hitting model. Indeed, the event occurs when the cumulative haz- 
ard strikes a random barrier R, which is exponentially distributed with mean 1 and is 
stochastically independent of X. 



2.1. Stochastic perturbations of common survival models 

In the framework we have described, one possibility is to consider stochastic perturbations 
of common survival models. Hcuristically, the idea is that if we can express the hazard 
r(t) of a given model as a solution of an ordinary differential equation = g(r(t)) 
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for some suitable function g, then we may be able to use g, or some modification of it, 
to model the drift of an SDE. Starting from this SDE, we can thus consider a latent 
diffusion model whose hazard function is a stochastic perturbation of r(t). 

We shall illustrate this by means of some examples. The simplest case is offered by 
the Gompertz model. The Gompcrtz hazard r(t) = /3exp{crf}, for a, (3 > 0, is a solution 
of the ordinary differential equation = g(r{t)) = ar(t). Consider, thus, the latent 
diffusion model based on the SDE having drift g(X t ) = 6X t for 9 > 0, 

dX t = eX t dt + adB t , t > 0, X o = x o >0, (5) 

and with hazard function h(u) = \u\. For a = 0, the SDE (5) reduces to the ordinary 
differential equation written above, for which the Gompertz hazard is a solution, and 
the latent diffusion model reduces to the Gompertz model. Hence, the latent diffusion 
model based on the SDE (5) with hazard function h(u) = \u\ can be seen as a stochastic 
perturbation around a central Gompertz model. This constitutes a simple example of a 
latent diffusion model, for which the law of X t , and thus also the law of the hazard, is 
known. In the other examples we shall now give, the SDE cannot be explicitly solved, but 
the latent diffusion models based on them can be treated by the techniques described in 
the present paper. 

Let us consider the Weibull model, whose hazard r(t) = afit 01 " 1 for a, f3 > is a non- 
trivial solution of the ordinary differential equation ^jr~ = #( r W) ~ ^r(t)^ a ~ 2 ^^ a ~ 1 \ 
Consider, thus, the latent diffusion model based on the SDE 

dX t = 6 l (sign(X t ))\X t \ e2 dt + adB u t>0, X a = x o >0, (6) 

where 

r 1, ifu>0, 
sign(it) = < —1, if u < 0, 
I 0, if u = 0, 

and with hazard function h(u) = \u\. For a = 0, the SDE (6) reduces to the ordinary 
differential equation written above, for which the Weibull hazard is a solution {62 here 
plays the role of (a — 2)/(a — 1)). Hence, the latent diffusion model based on the SDE 
(6), with hazard function h(u) = \u\, can be seen as a stochastic perturbation around 
a central Weibull model. For values of O2 in the interval (0,1), which correspond to 
a > 2, the SDE (6) has a non-explosive solution. This solution is weakly unique (see, 
e.g., Stroock and Varadhan (2006)). In Sections 5.1 and 6.1, we shall implement this 
latent diffusion model in some illustrative applications to real-world data. 

Using the simple idea outlined above, it is possible to develop other latent diffu- 
sion models, such as stochastic perturbations of log-logistic models and exponential- 
power models. The log-logistic hazard (r(t) — a/?i Q_1 /(l + j3t a ) for a,/3 > 0) and the 
exponential-power hazard (r(t) = a/3 Q f a_1 exp{(/3t) Q } for a, (3 > 0) can, in fact, be writ- 
ten as solutions of = g(r(t)) for suitable functions g (when a < 1 for the log- logistic 
and a > 1 for the exponential-power). Let us give a further example, which generalizes 
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the Pareto model. The Pareto hazard r(t) = a/t, for a > and t > A > 0, is a solution of 
the equation ^ = g(r{t)) = -^[r{t)\ 2 . Now, the SDE having drift g(X t ) = -9X?, for 
0>O, 

dX t = -9X?dt + adB t , t>\>0, X x = xx>0, (7) 

provides a stochastic perturbation around the Pareto hazard, but, unfortunately, this 
SDE cannot be used for our purposes since it has an explosive solution. On the other 
hand, we can modify (7), for example, by inclusion of X t in the diffusion coefficient, in 
order to obtain another SDE, 

dX t = -6X?dt + o-X t dB t , t>\>0, X x = x x >0, (8) 

that also provides a stochastic perturbation around the Pareto hazard, but has a non- 
explosive solution. The latter SDE can thus be transformed into one of constant diffusion 
coefficient, which can, in turn, be used in the latent diffusion model. Note that the solution 
of (8), and that of the corresponding SDE with constant coefficient, are almost surely 
positive and so we can take as hazard function h(-) the identity function, obtaining a 
particularly natural perturbation of the Pareto. It is worth recalling that an SDE with 
general diffusion coefficient a(X t ,9), 

dX t =P(X t ,6)dt + a(X t ,9)dB t , t>0, X =x , 

can, in fact, be transformed into an SDE of unit diffusion coefficient for the process Y, 
by applying the 1-1 transformation X t — > r](X t ;6) =: Y t , where r](x;9) = J x ^rg] dz is 
any anti-derivative of cr _1 (-;6') (we are assuming that a(x,9) is diffcrcntiable for any 
x £ C([0,oo),R)); see, for example, Beskos et al. (2006). This approach opens up to a 
number of possible stochastic perturbations of commonly used hazards. 

3. Markov chain Monte Carlo methods for latent 
diffusion models 

Let Pq(9) be the prior density, with respect to C d , of the d-dimcnsional parameter 
which appears in the drift of the diffusion process X, solution of (I). Fix a finite time 
horizon T of interest, with T > y\ n ], where j/r n i := max{yi, . . . ,y n }- The choice of T will 
be discussed in Section 4. Then, the joint posterior distribution of and ^[o,t] nas 
density, with respect to the product measure C d ® Wt,<j, given by 

tt(0> £[o,t] \yi, • • • ,Vn) = Cp e (9)g(x [a ^ T] |6>)%i, . . . , y n \x [0 , y[n]] ), (9) 

where C is a normalizing constant and g(x[QT]\0) '■= ^ T T e (x) is given by Girsanov's 
formula (2). 

A Gibbs sampling algorithm for sampling from (9) alternates between 

1. simulation of 0, conditional on the observations and the current path of XtQ^y, 
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2. simulation of Ipw, conditional on the observations and the current value of 0. 

Note that the parameter and the observations Y\ , . . . , Y n are conditionally indepen- 
dent, given the non-observed process -^[o,t] ■ In particular, from (9), the conditional distri- 
bution of 9 given A^t] nas density, with respect to C d , proportional to Pe{9)g(x[o_T] \9)- 
The update of the parameter is particularly straightforward when a conjugate prior pe(9) 
is chosen so that it is possible to analytically derive the conditional distribution of given 
X[ .T] an d sample directly from it. The second step is computationally more demanding. 
From (9), the conditional distribution of X[o t T\, given parameter and observations, has 
density, with respect to Wt i(Ti proportional to g(xm : T] \9)l(yi, ■ ■ ■ , y n \x) and cannot be 
sampled directly. An appropriate Metropolis-Hastings step is thus required. 

Implementation of the algorithm will necessary involve a discretization of the diffusion 
sample path. When the SDE cannot be solved, it is possible to use Euler-Maruyama ap- 
proximation; see, for example, Chapter 9 in Klocdcn and Platen (1992). Alternatively, it 
may be possible to simulate the diffusion path by means of the exact algorithm described 
in Beskos et al. (2006), thus avoiding approximation errors. 

3.1. Hastings- within-Gibbs algorithm for a latent diffusion model 

We now give the details of the Hastings-within-Gibbs algorithm for latent diffusion mod- 
els. 

Just as an example, consider a latent diffusion model with base diffusion which is 
solution of the SDE 

dX t = 9 T f(X t )dt + adB t , t>0, A =x , (10) 

with 9 T — (6i,...,0d) and f(x) T = (fi{x), . . . , fd(x)), where fi(x) is some real-valued 
function for i = 1, . . . ,d. Let the drift 9 T f(x) be such that the regularity conditions 
mentioned in Section 2 are satisfied. Let the prior for = (0i, . . . ,0d) be multivariate 
Gaussian with mean vector and variance matrix, respectively, given by 



Mi 

/'•2 



and S 



'An A12 
A12 A22 



Aid" 

A2d 



-Pd- 



-Alrf \2d 



Then, the distribution of 0, given the diffusion A[ ,t] — ^[o,t]i is still Gaussian, with 
mean and covariance matrix, respectively, given by 



Si 

s 2 

Sd 



and E 7 



Ln L12 
L\2 L22 



nd J^2d 



Lid 

L2d 



^dd 



(11) 
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where, for i = 1, . . . , d and j = 1, . . . , d, 

1 f T d ^ \ r T 

S t :=—I fi(xt)dx t + y2XijfXj, Lij-.= —2 fi(xt)fj(x t )dt + Xij. 
a Jo - =1 a Jo 

The update of can thus be performed by sampling directly from this conditional 
distribution. 

The update of the diffusion X[o,t] is less straightforward and requires an appropriate 
Metropolis-Hastings step. It is possible, for example, to carry out an independence sam- 
pler with proposal distribution given by a Brownian motion starting at xq- To improve 
the acceptance rate of the move that updates the diffusion path, we apply the following 
updating strategy. Let = t\ < ■ ■ ■ < t m = T. Instead of proposing a new diffusion path 
on the whole interval [0,T], we propose to change the trajectory only on a subinterval 
[ij,i,-l_2], keeping the rest of the diffusion fixed. To ensure continuity of the diffusion path, 
the proposal distribution for the new trajectory on the subinterval [^,^+2] is a Brown- 
ian bridge BB [U:ti+2 ](x u ,x ti+2 ) = {BB t (x u ,x u+2 ): U<t< t l+2 \, having as starting and 
ending points, respectively, the values X ti = Xt t and X ti+2 = Xt i+2 of the current diffu- 
sion. The proposed diffusion path x* T i is then given by {x^ = l(t ^ [tijti+a])^* + £ 
[U, ti+2])bbt(%ti j x t i+2 ) : t G [0,T]}, where bbt(xt i} Xt i+2 ) is the realization of the Brownian 
bridge BBu. tt+a -\(xt i ,Xt i+:i ). This move is accepted with probability 

1 A 9{bb [UM+2] (xu ; xt i+2 )\0) Kvi, Vn^J 
9{ x [u,u +2 ] \0) l(yi,. ■ ■ ,y n \x[o, y[n] ]) ' 

where g(xu. ;t ]\9) is given by Girsanov's formula restricted to the interval [£j,fj+2], that 
is, 

< \a\ r r u+2 e r f(x t ) A 1 (8 T f(x t )f \ 

9(*[u,t* a )\0) = expy t a2 da« - - ^ - 2 di|. 

The procedure is iterated for i = 1, . . . , m — 3. Note that the different blocks [ti, U+2] over- 
lap so that there are no time instants where the diffusion is kept fixed. For the same rea- 
son, the last block [t m _2,T] is updated by means of a Brownian motion Bu_ 2t T](xt m _ 2 ) 
starting at X tm _ 2 = Xt m _ 2 so that the value of the diffusion at T may vary. The ac- 
ceptance coefficient of the move that updates the last block is the same as in (12), 
with [U,U+ 2 ] = [t m - 2 ,T] and b[ tm _ 2iT ](x tm _ 2 ) in place of 6&[ tijt . +2 ](a; ti ,xt i+2 ), where 
b[t m - 2 ,T]( x t m ^ 2 ) is the realization of the Brownian motion B^ tm _ 2t j^(xt m _ 2 ) ■ 

This idea of updating smaller intervals at a time has been used in Shcphard and Pitt 
(1997) for the simulation of non-Gaussian time series models and later applied for the 
simulation of discretely observed diffusions, for example, by Elerian, Chib and Shephard 
(2001). 

In Section 3.2, we shall illustrate the implementation of this algorithm by means of 
a toy example. Note that in this section and in the following, we are considering base 
diffusions having drift linear in the parameter 9 simply for purposes of exposition. 
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3.2. Implementation of the algorithm: A toy example 

We show here the implementation of the algorithm described in Section 3.1, by means of 
a toy example. Consider the model based on the diffusion process satisfying the SDE 

dX t =e 1 sm(X t )dt + 9 2 dt + dB t , t>0, X a = 2, (13) 

with hazard function h(u) = u 2 . We simulate observations from this model for values of 
the parameters 0i = —1.4 and 02 = —1, and censoring time C = 0.9. In particular, we 
sample one realization x of the diffusion process satisfying (13), with 0\ = — 1.4 and 02 = 
— 1. We then simulate 200 i.i.d. observations from the corresponding distribution F x ^ = 
1 — exp{— f*(x s ) 2 ds} and censor the observations at a common cut-off C = 0.9. The 
diffusion is sampled at intervals of length 0.01, using Euler-Maruyama approximation. 
Figure 1 shows the corresponding hazards (the squared diffusion) and a histogram of 
sampled data. The hazard function has a typical shape, first (mainly) increasing and 
then (mainly) decreasing. 

We choose as time horizon of interest T = 1. We then run the Hastings- within-Gibbs 
algorithm under the following specifications. The prior for {0\,02) is Gaussian, as in 
Section 3.1, with /ii = —1.4, /i 2 = —1, An = A22 = 1/5 and A12 = 0. The starting values of 
the parameters are 0\ = O2 = and the starting diffusion is a Brownian motion, starting 
at xq = 2. The diffusion path is updated on subintervals of length 0.2 at a time. The 
algorithm is run for 200 000 iterations and the first 2000 are discarded as burn- in. 

Figure 2 shows the estimates of survival distribution, density and hazard function, 
based on the MCMC output, together with pointwise approximate 90% highest posterior 
bands. The true survival distribution and hazard function are also displayed to demon- 
strate the good fit of the MCMC estimates. Figure 2 also shows autocorrelation functions 
for 0i and 02 series. 
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Figure 2. Upper-left: true survival distribution 1 — F x x 2 (solid), together with its posterior 
mean (dashed) and pointwise approximate 90% highest posterior bands (dotted). Upper-right: 
true density (solid), together with its posterior mean (dashed) and pointwise approximate 90% 
highest posterior bands (dotted). Lower-left: true hazard function x 2 (solid), together with 
its posterior mean (dashed) and pointwise approximate 90% highest posterior bands (dotted). 
Lower-right: autocorrelation functions for 9i series (dotted) and 62 series (dashed). 



4. Reparametrizations of the latent diffusion models 

The MCMC algorithm described in the previous sections might have poor mixing prop- 
erties when we consider a finite-time horizon T significantly larger than the maximum of 
the data. This problem is evident in Figure 3. This figure shows the histogram of 200 i.i.d. 
observations from the distribution F x i t h, where x' is a new realization of the diffusion 
process satisfying the same SDE used in Section 3.2; also, the hazard function h and the 
censoring time C are the same. In this simulation, we have fixed a longer time horizon 
T = 1.8 and have then run the algorithm under the same specifications of Section 3.2. 
Figure 3 displays autocorrelation functions for 6\ and 62 series, which are not exponen- 
tially decreasing. With the same data set, but choosing a shorter time horizon (such as 
T = 1 , as in the previous section) , the algorithm does not exhibit strong serial correlation 
in the draws of 9\ and O2 ■ The worsening of the mixing properties of the algorithm when 
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Figure 3. Left: histogram of data sampled from F x , x i2 with censoring at C = 0.9. Right: 
autocorrelation functions for 8i series (dotted) and 62 series (dashed). 

T becomes significantly larger than the maximum of the data was also observed for the 
data set simulated in Section 3.2. 

To avoid this problem, we propose a modification of the algorithm which has good mix- 
ing properties, regardless of the choice of time horizon, and is, in fact, completely robust 
with respect to T. The algorithm is based on a simple reparametrization of the model. 
Indeed, the performance of MCMC methods, particularly when using Gibbs samplers, 
depends crucially on the parametrization of the unknown quantities in the hierarchical 
structure. The issue of reparametrization of the posterior distributions in order to improve 
convergence properties of the algorithms has received much attention. See, for exam- 
ple, Hills and Smith (1992), Gelfand, Sahu and Carlin (1995), Gclfand, Sahu and Carlin 
(1996) and Papaspiliopoulos, Roberts and Skold (2003, 2007). 

Instead of using the natural parametrization of the model in terms of (0,X), the 
so-called centered parametrization, we parametrize it in terms of (Q,X), where 

X t = l(f < y [n] )X t + l(t > y [n] )[B t - B V[n] ], t > 0. 

In the terminology used by Papaspiliopoulos, Roberts and Skold (2003), this is called a 
partially non-centered parametrization, the fully non-centered parametrization being, in 
this case, (Q,B). The diffusion X can then be reconstructed as a function of 0, X and 

yi,---,y n , by 

X t =X t , 0<t<y [nh 
dX t =p(X u Q)dt + o-dX t , t>y [n] . 

The joint posterior distribution of and X has density, with respect to the product 
measure C d ® W CT , given by 

w(0,x\yi,. . .,y n ) = Cpe(6)g(x [0ty[n]] \0)l{yi,. . . ,y n \x[o, V[n] ]), (14) 
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Figure 4. Autocorrelation functions for 9\ series (dotted) and #2 series (dashed), obtained with 
the algorithm based on the centered parametrization (left) and with the algorithm based on the 
partially non-centered parametrization (right). 



where X[ 0]J/H ] = £[o, yw ], C is a normalizing constant and g{x^ y .A9) = dw ( x [o,y w ]) 

y [n] > CT 

is given by Girsanov's formula (2). Note, in particular, that (14) characterizes the pos- 
terior distribution of X, and thus the posterior distribution of the diffusion X, over the 
whole positive half-line. It thus also highlights that X[o,y [n ,] acts as sufficient statistics. 

It is possible to simulate from (14) by means of a Gibbs sampler quite similar to the one 
described in Section 3.1. However, the algorithm is now completely robust to the choice 
of T since the update of the parameter 0, conditionally on X, only involves X^ >y a. 

In the first step, in fact, we now simulate O conditionally on X^ ty a. In the second 

step, we simulate X over the time interval of interest, [0,T], conditionally on and the 
observations. In this case, we use a proposal distribution which is a Brownian motion 
starting at xo over the time interval [0, ?/[„]] and a Brownian motion starting at over 
the time interval [j/r„],T]. On [0, ?/[„]], we again follow the updating strategy with the 
overlapping Brownian bridges that was described in Section 3.1. When reconstructing 
the diffusion -X"[o,t] from G and ^[o,t] > we are careful to preserve the continuity of the 
diffusion path at time y\ n ] . Details are omitted. 

Figures 4 and 5 compare mixing and MCMC estimates obtained with the algorithms 
based on the centered parametrization and on the partially non-centered parametrization 
for the data set corresponding to Figure 3. The specifications of the two algorithms are 
as in Section 3.2. Note that the hazard function is bathtub shaped. Hazard functions 
with such a shape are quite common in survival analysis (think, for instance, of human 
mortality) . 

As we shall see in Section 6, another reparametrization of the model, one that turns 
out to be useful in the presence of covariates, is the fully non-centered parametrization in 
terms of (<d,B). The diffusion X can be reconstructed as a function of and B, simply 
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Figure 5. Top: true survival distribution distribution 1 — F x i x i2 (solid), together with its pos- 
terior mean (dashed) and pointwise approximate 90% highest posterior bands (dotted), obtained 
with the algorithm based on the centered parametrization (left) and with the algorithm based 
on the partially non-centered parametrization (right). Bottom: true hazard function x' 2 (solid), 
together with its posterior mean (dashed) and pointwise approximate 90% highest posterior 
bands (dotted), obtained with the algorithm based on the centered parametrization (left) and 
with the algorithm based on the partially non-centered parametrization (right). 



by the SDE 

dX t =P(X t ,Q)dt + adB t , t>0, X = x . 

The joint posterior distribution of O and B has density, with respect to the product 
measure C d <g) W CT , given by 

tt(6), %i, . . . ,y n ) = Cp e (d)l(y 1 , y n \6, &[o l!/w ]), (15) 

where C is a normalizing constant and l(yi, ■ . ■ , y n \@, &[o,s/r n ]]) = Kvii ■ ■ ■ j 2/nl x [o, »/[„]]) ^ s as 
in (4). Note that, similarly to what has been observed for the partially non-centered 
parametrization, (15) also characterizes the posterior distribution of the diffusion X over 
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the whole positive half-line. Moreover, the Gibbs sampler that simulates from (15) is also 
completely robust with respect to the choice of the time horizon T. In the first step, we 
simulate 9 conditionally on -B[o,j/r „,] and the observations. Note, in particular, that the 
conditional distribution of O, given S[o.t] an d the observations, now has density, with 
respect to C d , proportional to pe(0)l(yi , . . . , y n \9, b[o,y [n] ])- I n the second step, we simu- 
late B over the time interval of interest, [0,T], conditionally on and the observations. 
For proposal distribution, we use a Brownian motion starting at and we employ the 
updating strategy based on overlapping Brownian bridges. In this case, when updating 
the Brownian motion path b over the subinterval [U, ^+2] , we need to reconstruct the cor- 
responding diffusion path x over the subinterval [ti,T] in order to preserve the continuity 
of the diffusion path at time U + 2 ■ Details are omitted. 



5. Latent diffusion models for multiple groups of 
observations 

We now discuss a straightforward generalization of the framework developed in the previ- 
ous sections and deal with the case of multiple groups of observations, where the observa- 
tions within each group are taken under homogeneous conditions. Consider, for example, 
the case in which different treatments are being administered to different groups of pa- 
tients in a clinical trial. 

Given = 9, let I'M,... be q stochastically independent diffusion processes sat- 
isfying (1) and F X [i\ h ,... ,F X [i] h the relative random distributions, as in (3). Now, con- 
sider q sequences of observations (l^ 1 )„, . . . , (y,l 9 ')n such that the random variables in 
(,(Yn^)ni ■ ■ ■ j (Xn^)n) a rc conditionally independent, given F X [i\ h , . . . , F x iq] hi an d the 
random variables in (Yn k ^) n have common distribution F X [k] h for k = 1, . . . , q. 

The joint distribution of y} 1] Y$ , . . . , Y[ q] , . . . , Y^f , given = X W , . . . , jM = 
, has density, with respect to C n (where n = n\ H 1- n q ), given by 

l( y W [1]. [9] v [<i]\ x K M \-T\l( v W \ 

k=l 

where y [nk] := max{yf ] , . . . ,y l n ] k } and l{yf\ . . . ,y [ nl |»p;„ [n ,]) is as in ( 4 )- Usin g the P ar " 
tially non-centered parametrization described in Section 4, the joint posterior distribution 
of 6 and X^\ . . . , has density, with respect to the product measure C d <g> W 9 ,, given 
by 

n(e,^\...,x^\...,y^...;y^,..., y ^) 

(16) 



C Pe (9) 



.k=l 
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dP «r , , 



-(■r 



[k] 



is given by 



where C is a normalizing constant and g(x^ y 

Girsanov's formula (2). 

The contributions of the q groups of observations factorize in (16) and a simple modi- 
fication of the MCMC algorithm presented in the previous sections may be used to deal 
with this case. Let Tx,...,T q be the time horizons of interest for the q groups, with 
Tk > U[n k } f° r k = 1, . . . ,q. The Hastings-within-Gibbs algorithm for sampling from (16) 
alternates between 

1. simulation of 0, conditional on the current paths of X 

2. for each k in {l,...,q} 



[i] 



X 



[fc] 



, Yn} and the current value of 0. 



simulation of xl^ Tk 



conditional on the observations 



Consider, for example, a latent diffusion model with q stochastically independent dif- 
fusion processes, XW,...,Xb\ satisfying the SDE (10). Choose the same multivariate 
Gaussian prior for that was used in Section 3.1. Then, the distribution of 0, given 
xll i = x3 i, . . . , X$ , = x\n i, is still Gaussian, with mean vector and co- 

[0,!/[ ni ]] [0,J/[ni]]' ' V>,V[n q ]] [0.l/[n,]]' ' 

variance matrix as in (11), but with 



1 



f' 



k=l 



3=1 



for i = 1, . . . , d, j = 1 , . . . , d. The update of the parameter can thus be performed by 
sampling directly from this conditional distribution. The second step may be carried out 
by q repetitions of the updating mechanism described in Sections 3.1 and 4. 

Note that we are here considering a simple hierarchical structure, where inference on 
the separate groups is linked only at the level of the finite-dimensional parameter 0. For 
some applications, this might allow too little borrowing of strength for inference across 
groups of patients. In Section 6, we shall instead describe a more complex hierarchi- 
cal structure, suitable in the presence of covariatcs and allowing for a much stronger 
borrowing of strength for inference across individuals. 



5.1. An illustrative application to a real data set with multiple 
groups of observations 

In this section, we show the implementation of the latent diffusion model for multiple 
groups of observations via an illustrative application to a small data set from a clinical 
trial, one that has been considered in a number of papers in the context of survival 
analysis, among them Gehan (1965), Cox (1972), Wei (1984) and Xu and O'Quiglcy 
(2000) in the non-Bayesian literature and Kalbflcisch (1978), Laud, Damien and Smith 
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(1998) and Damien and Walker (2002) in the Bayesian literature. In the trial, reported by 
Freireich (1963), 6-mercaptopurinc (6-MP) was compared to a placebo in the maintenance 
of remission in acute leukemia. The following lengths of remission in weeks were recorded 
for 42 patients, half of which treated with the 6-MP drug and half with the placebo (a 
+ sign indicates a censored observation): 

6-MP: 6, 6, 6, 6+, 7, 9+, 10, 10+, 11+, 13, 16, 17+, 19+, 20+, 22, 23, 25+, 32+, 
32+, 34+, 35+, 
placebo: 1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11,11, 12, 12, 15, 17, 22, 23. 

We thus consider a model for two groups of observations, namely the 6-MP drug group 
and the placebo group. As latent diffusion model, we shall use the stochastic perturbation 
around the Weibull described in Section 2.1. Recall that this model has base diffusion 
satisfying the SDE 

dX t = 9 1 (sign(X t ))\X t f 2 dt + adB t , t>0, X o = x >0, 

and hazard function h(u) = \u\. 

We express the data as fractions of one year and choose as time horizons of interest 
T\ = Ti = 0.75, corresponding to 9 months (39 weeks). We take ©i and 02 to be a priori 
independent, with a Gaussian prior distribution for 0i, mean fj, = 0, variance 1/A = 5, and 
a uniform prior over [0, 1] for 02. Moreover, we set xq = 0.8 and a = 8. We then run the 
Hastings-within-Gibbs algorithm based on the partially non-centered parametrization. 
The update of 0i is performed by sampling directly from the conditional distribution 
0i, given 02,-X"jq^ yX^ y p which is still Gaussian with mean ^q^f and variance 

l+a> where 



S:=- 
a 1 



i 

E 

.7=1 



" 3 ((sign(4 jl ))|4 J Y 2 )dx' jl 



and L := — 
a 1 



E 



For the update of 02 , we use an independence sampler with a Beta proposal distribution, 
with parameters (1/2,1/2). The update of XM and X^ is carried out as described in 
the previous sections. The algorithm is run for 200000 iterations and the first 2000 are 
discarded as burn-in. 

Figure 6 displays the MCMC estimates of the survival distributions of the two groups, 
6-MP drug and placebo, together with the relative Kaplan-Meier curves. Note that the 
MCMC estimates of the two survival distributions are closer to one another than the 
two Kaplan-Meier curves, thus indicating borrowing of strength for inference among 
the two groups. Hence, the latent diffusion model, which gains much flexibility over a 
fully parametric model by introducing randomness around it, docs not suffer from the 
opposite problem of being too data-driven. Figure 6 also displays the MCMC estimates 
of the hazards of the two groups. 

We could now verify the efficacy of 6-MP drug treatment as proposed in Damien and Walker 
(2002). In particular, under the hypothesis that the 6-MP drug is inefficient, we would 
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Figure 6. Left: posterior mean survival distributions and pointwise approximate 90% highest 
posterior bands for the group of patients treated with 6-MP drug (solid) and for the group of 
patients treated with the placebo (dashed), together with corresponding Kaplan-Meier curves. 
Right: posterior mean hazards for the group of patients treated with 6-MP drug (solid) and for 
the group of patients treated with the placebo (dashed) . 



regard all patients as belonging to a single group, instead of two. We could then imple- 
ment the latent diffusion model based on the stochastic perturbation of the Weibull, but 
with just one diffusion process. Let Mi denote the model where all patients belong to 
a single group (corresponding to the hypothesis Hi of null efficacy of the 6-MP drug) 
and let M 2 denote the model considered above (corresponding to the hypothesis H 2 of 
efficacy of the 6-MP drug) . If the a priori probabilities of hypotheses Hi and H 2 are set 
equal to 0.5, the Bayes factor 

probability density of data under model Mi 

BF — — ; 

probability density of data under model M 2 

gives the posterior odds in favor of Hi. As expected, the computed Bayes factor (BF = 
9 x 10~ 6 ) provides strong evidence for the efficacy of the 6-MP drug. 



6. Latent diffusion models with covariates 

Covariates can be included in the latent diffusion models described in a very natural 
way, as directly influencing the underlying diffusion. For instance, if Z is a vector of p 
covariates measured at time 0, we can use the model based on the diffusion satisfying 
the SDE 

dX t = /3(X t ,z,e)dt + adB t , t>0, 

(17) 

X = x {z,6). 

In particular, following suggestions of Aalen and Gjcssing (2001) and 
Aalen, Borgan and Gjessing (2008) for barrier hitting models, those covariates which 
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represent measures of how far the underlying process that leads to the event has ad- 
vanced (such as staging measures in cancer) may be taken to influence the starting 
point of the diffusion. Those covariates which instead represent causal influence on the 
development of the process may be taken to influence the drift of the diffusion. 

Let z take values zW , . . . , z^ . Then, (17) gives q different diffusions, X^= zW \ . . . ,X^= z[q] \ 
driven by the same Brownian motion B, with 

&X [ r* [k]] =(3(xl z=zlk] \zW,6)dt + adB t , t>0, 
X = x (zW) 

for k = 1, . . . , q. Denote by F x[2 . = ji] ] h , . . . , i^^^Mj h the relative random distributions, 

as in (3). Moreover, denote by Y^~ 2 Yn Z k ~ z ' the survival times of the individ- 

uals having covariates z = z^ for k = 1, . . . , q. The survival times Y^ z_z Yn~ z ' , 

conditionally on F x{2 . =2 .{k] ] , , are i.i.d. with common distribution i*V [z=z [fc]j , . Since the q 
diffusions are driven by the same Brownian motion, it is here more natural to use the 
fully non-centered parametrization of the model, described in Section 4. In particular, 

the joint distribution of Y{ , . . . , Yr^ , . . . , Y j 1 , . . . , Y„ q , given B = b and 
= 6, has density, with respect to £™ (where n = fix H h n q ), given by 

%^..., y i z r%-^^^ 
=n/( y r z[fc \...,^r [fcIi i^w fcll ^ lfel ) J 

k=l 

where y [n] := ma,x{y 1 , . . . , y„}, y[ ntt \ := max{y[ z_zI , . . . , y£~* [ 1 } and 

l(v lz=zlk]] v [z=zlki] \9 6m ^H)-!^ 1 ^' 1 w l z = zW ]|x [z=z[fcI h 

is as in (4). The joint posterior distribution of and B has density, with respect to the 
product measure C d ® W CT , given by 

n(e, %[■= z[11 ' , . . . , ,irr ; • ■ • ; y [ r zW] \ . . . , <= z 1911 ; zW . • ■ • . » w ) 

(18) 

= cps{0) n Ky[ z=zW] , • • • , v [ :r lk]] & V w zW )• 

fe=l 

Note that this model is structurally different from the model for multiple groups of 
observations described in Section 5 since the distributions of the survival times are 
here linked at the level of the Brownian motion, allowing a much stronger borrowing 
of strength for inference across individuals who share a common value of even just one 
of the p covariates. 
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As usual, we denote by T the time horizon of interest, T > j/r n i. The Hastings- within- 
Gibbs algorithm for sampling from (18) alternates between 

1. simulation of 0, conditional on the current path of B^ y . ,], the observations and 
the covariates; 

2. simulation of -E>[o,t] ; conditional on the current value of 0, the observations and the 
covariates. 

In particular, the update of the Brownian motion -B[o,t] can be carried out via the 
updating strategy based on overlapping Brownian bridges, as described in Section 4. 

6.1. An illustrative application to a real- world data set with 
covariates 

In this section, we illustrate how to efficiently handle the model with covariates via an 
application to a data set concerning 272 patients diagnosed with non-small cell lung 
cancer. The data set is described in detail in Muers, Shcvlin and Brown (1996). Survival 
times are measured in months from the time of diagnosis (with 17% of censoring) and 
some covariates are recorded at the time of diagnosis. Just to give an illustration of the 
model, we shall consider here two covariates: sex (F = 0: male and F — 1: female) and 
hoarseness (H = 0: absent and H = 1: present). Using, for instance, the model based 
on the stochastic perturbation around the Weibull, we can include these covariates as 
follows: 

dX t = cxp{6 w + 9 11 F}( f ugp{Xt))\Xt\ e > dt + <rdB u t > 0, 
X =exp{6 00 + 6 01 F + 9 02 H}. 

Note that, following the suggestion of Aalen, Borgan and Gjcssing (2008), we have mod- 
eled the covariate hoarseness, which only represents a measure of how far the lung tumor 
has advanced, as influencing the starting point of the diffusion; we have instead taken 
the covariate sex to influence both the starting point and the drift of the diffusion, in 
order to account for possible differences between males and females, both in the hazards 
at time of diagnosis and in the hazard dynamics. The covariate combinations determine 
four different diffusions, X^ F=0H=0 \ Xl F =°> H=1 \ X^ F=1 ' H =^ and Xl F = l > H=1 \ driven 
by the same Brownian motion. According to this model, the hazard at time (the time 
of diagnosis) of patients suffering from hoarseness is exp{#02} times that of patients not 
suffering from hoarseness and the hazard at time of female patients is exp{#oi} times 
that of male patients; moreover, exp{#n} gives a measure of the different progression 
rate of the cancer in female patients with respect to male patients. 

We express the data as fractions of a quadrcnnium and choose as time horizon T 
the maximum of the observations, corresponding to about 37 months. In order to avoid 
dependencies among the (#00j ^01,^02) parameters and among the (6*10, #11) parameters, 
we reparametrize them in terms of (7700, $01, ^02) and (7710, #11), with 9qq = r/oo — Pf^qi — 
PH&02 and 6*io = ?/io — Pf&ii, where we have denoted by pp and pn the percentage of 
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Figure 7. Left: posterior mean survival distributions, together with Kaplan-Meier curves, for 
male patients without hoarseness at time of diagnosis (F — 0, H — 0, solid line), for male patients 
with hoarseness (F = 0, H = 1, dotted and dashed line), for female patients without hoarseness 
(F — 1, H — 0, dashed line) and for female patients with hoarseness (F — 1, H — 1, dotted line). 
Right: the same for posterior mean hazard functions. 



females patients and the percentage of patients suffering from hoarseness, respectively. 
We take all of the parameters to be a priori independent, with Gaussian priors with 
mean and variance 5 for all parameters except 02, for which we use a uniform prior 
over [0,1]. Moreover, we set a = 8. We then run the Hastings- within-Gibbs algorithm 
based on the non-centered parametrization of the model. The update of the parameters 
is performed via independence samplers having proposal distributions equal to the priors. 
The algorithm is run for 200 000 iterations and the first 2000 are discarded as burn-in. 

Figure 7 shows posterior mean survival distributions, together with Kaplan-Meier 
curves, for male patients without hoarseness at time of diagnosis (F = 0,H = 0, solid 
line), for male patients with hoarseness (F = 0, H — 1, dotted and dashed line), for female 
patients without hoarseness (F = 1,H = 0, dashed line) and for female patients with 
hoarseness (F = 1,H = 1, dotted line). The four survivals are also plotted separately in 
Figure 8 with 90% highest posterior bands. Figure 7 also displays the posterior mean 
hazard functions for the four covariate combinations. In particular, the posterior mean 
hazard at time of patients suffering from hoarseness is 2.2 times bigger than that of 
patients not suffering from hoarseness, whereas the hazard at time of female patients 
is 0.6 times that of male patients. 

Note that even though we have only considered categorical covariatcs in this illustra- 
tive application, quantitative covariates can also be included in the model; however, it 
may be necessary to categorize these covariates in order to have a sufficient number of 
observations for each of the diffusion processes. This, of course, requires larger data sets. 
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Figure 8. Upper-left: posterior mean survival distribution and pointwise approximate 90% 
highest posterior bands, together with Kaplan-Meier curve, for male patients without hoarse- 
ness. Upper-right: the same for male patients with hoarseness. Lower-left: the same for female 
patients without hoarseness. Lower-right: the same for female patients with hoarseness. 



7. Generalization to the case of unknown diffusion 
coefficient 

An important generalization of the models we have considered thus far consists of consid- 
ering diffusion processes with unknown diffusion coefficient a since a describes a natural 
measure of prior uncertainty. We briefly discuss how to deal with this case. 

Let S be a real random variable. Given = 9 and E = a, consider the scalar diffusion 
process X solution of the SDE (1) and denote by Pr,e,<r the law of AT[o,t]- Let ps{-) be the 
prior density, with respect to C, of E (for simplicity, we take and E to be stochastically 
independent a priori). Let us consider, for instance, the centered parametrization of 
the model. The joint posterior distribution of (0,E,X[ Oj t]) has density, with respect to 
C d+1 ®Wt>, given by 

o-, X[o,t] 1 3/i> • ■ • ' y«) = c PQ{ s )Pr.W)g{x[Q,T\ \0, <r)l{yi, ■ ■ ■ > yn\x[o,y ln] ]), (19) 
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where C is a normalizing constant and fl , (x[o,T] \0, c) := d ^ T "'° (%[o.r]) is given by Gir- 
sanov's formula (2). 

The quadratic variation of a diffusion processes, having diffusion coefficient a, satisfies 

m 

lim y^( x u/m - X t{i _ l)/m f = to- 2 , W T ,<7-a.s. for all t. 

i=l 

Therefore, the conditional distribution of E, given the diffusion Xi 0>t -i, degenerates to a 
point mass and E is completely determined by the diffusion path. In practice, we cannot 
simulate the diffusion path in continuous time, but just at discrete time instants. In any 
case, the finer the discrete-time approximation {XiT/m'- i = l,...,m} of the diffusion 
Xr 0j T]j the stronger the dependence between {XiT/m'- i = 1, .. ., m} and E. Consider the 
algorithm for the simulation from (19) that alternates between: 

1. simulation of 0, conditional on the current value of E and the current path of ^[o,t] ! 

2. simulation of E, conditional on the current value of and the current path of ^[o,t] ! 

3. simulation of X[ 0i t], conditional on the observations and the current values of 
and E. 

The finer the approximation of the diffusion path, the worse the convergence of the 
algorithm becomes. In the limiting case m = oo (i.e., if the diffusion process could be 
simulated in continuous time), this scheme would be reducible; see Roberts and Stramer 
(2001). An alternative way to see this problem is to note that the collection of measures 
{Wt, ct : cr G K} are mutually singular and, therefore, so are the measures {Pt,6>,<t : a £ 

In this case, the need for a different parametrization of the model is thus compelling. 
Following Roberts and Stramer (2001), we parametrize the model in terms of (0,E,X), 
where X t = (X t — Xo)/E. By Ito's formula, 

dX t = ^ X £ &) dt + dB t , t>0, X =0. 

The distribution of -X^o.t] depends on E, but any realization of X[ t] contains only 
finite information about E. Analogous reparametrizations are derived starting from the 
ones described in Section 4. MCMC algorithms based on these reparametrizations can 
be obtained as simple modifications of the ones previously described. 

Consider the toy example described in Section 3.2 and assume the same model, but let 
the diffusion process have an unknown diffusion coefficient. Let the prior for this coeffi- 
cient be exponential with mean 1. Figure 9 displays the results obtained with the MCMC 
algorithm based on the reparametrization (0,E,A"). Specifications of the algorithm are 
as in Section 3.2. Note that the mixing for a is slow relative to the very good mixing for 
9\ and 02, but this docs not prevent good estimates of the survival distribution, density 
and hazard being obtained. Slow mixing for a could probably be improved by a further 
reparametrization of the model. 

Alternatively to the case of an unknown diffusion coefficient, it would be possible to 
consider models based on diffusion processes having a = 1, but with hazard function 
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Figure 9. This corresponds to Figure 2, but for the model with unknown diffusion coefficient. 
The lower-right plot also displays the autocorrelation function for a series (dotted-dashed line). 

h(T,X), where T is a random parameter. A rcparamctrization of the model would also 
be necessary in this case. 

8. Discussion 

In this paper, we have described latent diffusion models for survival analysis and have 
shown that these models can be efficiently treated by means of MCMC techniques. We 
have dealt with the case of multiple groups of observations, typical of clinical trials, and 
we have shown how covariates can be efficiently included in the models. We have outlined 
how, in the described framework, it is possible to consider stochastic perturbations of 
common survival models. In particular, we have used a stochastic perturbation of the 
Weibull model in some illustrative applications to small data sets, with multiple groups 
of observations and with covariates. Applications to larger data sets, where the potential 
of a latent diffusion model may be fully expressed, will be the object of future work. All 
analyses presented are computationally feasible within R (see R Development Core Team 
(2007)). 
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Another generalization of the model we intend to explore regards random probabilities 
based on jump diffusion processes. As observed in Section 2, the cumulative hazard 
functions associated with random probabilities based on diffusions are smooth, being the 
integrals of continuous processes. By replacing the diffusion process with a jump diffusion 
process, it would be possible to capture sudden changes in the behavior of cumulative 
hazards that might be due to some kind of shock experienced by the population. Hazards 
modeled through stochastic processes with jumps have been studied, for instance, by 
Gjessing, Aalen and Hjort (2003). 
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